DATA 420 - Final Assignment

Class: Spring 2025 Predictive Analytics (DATA-420-DAA)

Instructor: Jason Pemberton

Student: Brandon Carine

Topic: Metro Interstate Traffic in the Twin Cities Area of Minnesota

Date: August 21, 2025


1 Introduction

Driving is the most common form of transportation by far in the United States. As populations increase and technology gets better, people want to be more informed about the roads they use to commute. When will I arrive? Will the roads be backed up when I leave the house? What can I expect traffic to be like?

The data set I chose provides information on traffic, time, and weather. The data was recorded along the I-94 highway (westbound). This means the location is the main highway connecting St. Paul to its twin city of Minneapolis. This project follows the CRISP-DM method, starting with Business Understanding and finishing with a conclusion.


2 Business Understanding

TwinTraffic is a fictitious company with an app about driving in Minnesota, with a focus on the cities of St. Paul and Minneapolis. The app has different functions, including weather-related updates, information on driver safety, and a real-time traffic map of the two cities. The company wants to provide users with more information with regard to each of these functions.

TwinTraffic Logo
Figure 1. TwinTraffic logo [1]

These are the questions that are focused on:

Question 1: How does weather affect traffic volume, and can this be used to inform users?

Question 2: What weather-related events can be tied together to inform users about dangerous driving conditions throughout the day?

Question 3: Can we predict high traffic volumes to provide notifications to drivers before they start their commute?

These three problems will be answered using Linear and Multiple Regression, K-Means, and a Decision Tree.


3 Data Understanding

3.1 Loading Libraries and the Data

library(tidyverse)
library(plotly)
library(lubridate) # Used for extracting an "hour" and "day" column from the "date_time" column
library(rpart) # Used for creating the decision tree
library(rpart.plot) # Used for creating a visualization for the decision tree

# Assigning the data
metro_raw <- read_csv("~/RStudio_Projects/Metro_Interstate_Traffic_Volume.csv")

3.2 Looking at the Data

3.2.1 General Information

str(metro_raw)

Name: Metro_Interstate_Traffic_Volume.csv

Source: https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume [2]

File Size: 3.1 MB

Rows: 48,204

Columns: 9

Time Range: October 2, 2012 9:00 AM to September 30, 2018 11:00 PM

3.2.2 Data Dictionary

Number Column Names Type Description
1 holiday Character US National holiday observed
2 temp Number (float) Average temperature recorded in the hour, measured in Kelvin
3 rain_1h Number (float) Total rain recorded in the hour, measured in mm
4 snow_1h Number (float) Total snow recorded in the hour, measured in mm
5 clouds_all Number (int) Percentage of the sky covered by clouds
6 weather_main Character Weather described in one word
7 weather_description Character Weather described with more detail
8 date_time Date_time Date and hour of recorded data instance
9 traffic_volume Number (int) Total number of cars in the hour

3.3 Statistics

3.3.1 Summary Statistics

summary(metro_raw)

Some extreme outliers were discovered.

rain_1h - Max. states a value of 9831.3. Given that the measurement is mm, I highly doubt that almost 10 metres of rain fell in one hour. This instance is found at row 24,873 (July 11, 2016 at 5 PM), and it will be deleted during data preparation.

temp - Min. states a value of 0.0. Again, given that the measurement is Kelvin, this means Minnesota was experiencing absolute zero, which doesn’t make any sense. Ten instances were discovered and will be deleted during data preparation.

snow_1h - Max. states a value of 0.51. Again, given that the measurement is mm, this becomes highly suspect. I find it hard to believe Minnesota had a maximum of half a mm of snow in one hour from 2012 to 2018. However, I will not delete this column in data preparation because there is a possibility that the sensor used for collecting the data only recorded snow accumulation on the road surface and it melted very fast.

3.3.2 Boxplot of Traffic Volume

# EDA for traffic_volume: Boxplot Visual
metro_raw %>% 
  ggplot(aes(y = traffic_volume)) +
  geom_boxplot(fill = "#52c9e8") +
  theme_minimal() +
  labs(title = "Boxplot: Traffic Volume", y = "Traffic Volume")

# EDA for traffic_volume: Boxplot Stats
traffic_boxplot <- boxplot.stats(metro_raw$traffic_volume)

Q1 <- traffic_boxplot$stats[2]
Q3 <- traffic_boxplot$stats[4]

Traffic volume seems to have a varied distribution. Half of the data falls somewhere between 1,193 (Q1) and 4,933 (Q3) cars recorded. The median appears to be about 3,200. I will be using the Q1 and Q3 values later in data preparation.


4 Data Preparation

Note:

ChatGPT helped me come up with a way to bin the traffic volumes.

ChatGPT also introduced me to the library lubridate, which was helpful with creating the hour and day columns.

4.1 Step 1: Bin Traffic Volume

metro <- metro_raw %>% 
  mutate(traffic_volume_bin = case_when(
    traffic_volume < Q1 ~ "Low",
    traffic_volume >= Q1 & traffic_volume < Q3 ~ "Moderate",
    traffic_volume >= Q3 ~ "High",
  ))

The traffic_volume column was binned using the quartiles in Data Understanding. This new column will be used later in the Decision Tree.

4.2 Step 2: Create New Columns

metro$hour <- hour(metro$date_time)
metro$day <- wday(metro$date_time, label = TRUE) #true makes it so the day name is displayed

metro <- metro %>% 
  mutate(is_weekend = case_when(
    day %in% c("Sat", "Sun") ~ 1,
    TRUE ~ 0
  ))

I created an hour column for use in K-Means. Using the day column, I created an is_weekend column for the Decision Tree as well.

4.3 Step 3: Convert Temperature

metro <- metro %>% 
  mutate(temp_c = temp - 273.15)

Converted Kelvin to Celsius for easier interpretation.

4.4 Step 4: Remove Outliers

metro <- metro[-24873, ]
metro <- metro %>%
  filter(temp != 0)

Deleted the outlier pertaining to rain (row 24,873) and the ten instances of absolute zero temperatures.

4.5 Step 5: Select Needed Columns

metro <- metro%>% 
  select(traffic_volume, traffic_volume_bin, temp_c, rain_1h, snow_1h, clouds_all, hour, is_weekend)

I selected only the columns that would be used in the Modeling portion of the project.


5 Method 1: Multiple Linear Regression

Question: How does weather affect traffic volume, and can this be used to inform users?

5.1 Modeling

I will perform linear regression first with each independent, numeric weather variable against traffic_volume.

5.1.1 Linear Regression

# Create First Linear Regression
temp_lm <- lm(traffic_volume ~ temp_c, data = metro)

# Assigning R-Squared, intercept, and slope of the regression
temp_r2 <- summary(temp_lm)$r.squared
temp_int <- coef(temp_lm)[1]
temp_slope <- coef(temp_lm)[2]
# Adding the linear equation text
temp_text <- paste0(
  "Traffic Volume = ", round(temp_int, 2),
  " + ", round(temp_slope, 2), " * Temperature(C)\n",
  "R^2 = ", round(temp_r2, 3)
)

#Scatterplot of First Linear Regression
metro %>%
   ggplot(aes(temp_c, traffic_volume))+
   geom_point(colour = "#002d5d")+
   geom_smooth(method = lm, se = FALSE, colour = "#52c9e8")+
   annotate("text", x = 5, y = 8000, label = temp_text, hjust = 0, size = 3)+
   labs(
       title = "Temperature(C) vs Traffic Volume",
         x = "Temperature(C)",
         y = "Traffic Volume"
       ) + 
   theme_minimal()

Positive correlation, for every temperature increase (Celsius) there will be about 21 more cars on the road. Model is weak, with an R-squared value of 0.017.

# Create Second Linear Regression
rain_lm <- lm(traffic_volume ~ rain_1h, data = metro)

# Assigning R-Squared, intercept, and slope of the regression
rain_r2 <- summary(rain_lm)$r.squared
rain_int <- coef(rain_lm)[1]
rain_slope <- coef(rain_lm)[2]
# Adding the linear equation text
rain_text <- paste0(
  "Traffic Volume = ", round(rain_int, 2),
  " + ", round(rain_slope, 2), " * Rain(mm)\n",
  "R^2 = ", round(rain_r2, 3)
)

#Scatterplot of Second Linear Regression
metro %>%
   ggplot(aes(rain_1h, traffic_volume))+
   geom_point(colour = "#002d5d")+
   geom_smooth(method = lm, se = FALSE, colour = "#52c9e8")+
   annotate("text", x = 30, y = 7000, label = rain_text, hjust = 0, size = 3)+
   labs(
       title = "Rain(mm) vs Traffic Volume",
         x = "Rain(mm)",
         y = "Traffic Volume"
       ) + 
   theme_minimal()

Negative correlation, for every mm of rain there will be about 44 fewer cars on the road. Model is weak, with an R-squared value of 0.001.

# Create Third Linear Regression
snow_lm <- lm(traffic_volume ~ snow_1h, data = metro)

# Assigning R-Squared, intercept, and slope of the regression
snow_r2 <- summary(snow_lm)$r.squared
snow_int <- coef(snow_lm)[1]
snow_slope <- coef(snow_lm)[2]
# Adding the linear equation text
snow_text <- paste0(
  "Traffic Volume = ", round(snow_int, 2),
  " + ", round(snow_slope, 2), " * Snow(mm)\n",
  "R^2 = ", round(snow_r2, 3)
)

#Scatterplot of Third Linear Regression
metro %>%
   ggplot(aes(snow_1h, traffic_volume))+
   geom_point(colour = "#002d5d")+
   geom_smooth(method = lm, se = FALSE, colour = "#52c9e8")+
   annotate("text", x = 0.25, y = 7000, label = snow_text, hjust = 0, size = 3)+
   labs(
       title = "Snow(mm) vs Traffic Volume",
         x = "Snow(mm)",
         y = "Traffic Volume"
       ) + 
   theme_minimal()

Positive correlation, for every mm of snow there will be about 177 more cars on the road. Model is extremely weak, with an R-squared value of 0.

I have two possible explanations for this:

  1. People drive slower when it’s snowing and therefore there’s more cars on the road. Traffic is more gridlocked.
  2. This model is extremely weak and therefore may not be meaningful at all.
# Create Fourth Linear Regression
cloud_lm <- lm(traffic_volume ~ clouds_all, data = metro)

# Assigning R-Squared, intercept, and slope of the regression
cloud_r2 <- summary(cloud_lm)$r.squared
cloud_int <- coef(cloud_lm)[1]
cloud_slope <- coef(cloud_lm)[2]
# Adding the linear equation text
cloud_text <- paste0(
  "Traffic Volume = ", round(cloud_int, 2),
  " + ", round(cloud_slope, 2), " * Cloud Coverage(%)\n",
  "R^2 = ", round(cloud_r2, 3)
)

#Scatterplot of Fourth Linear Regression
metro %>%
   ggplot(aes(clouds_all, traffic_volume))+
   geom_point(colour = "#002d5d")+
   geom_smooth(method = lm, se = FALSE, colour = "#52c9e8")+
   annotate("text", x = 50, y = 8000, label = cloud_text, hjust = 0, size = 3)+
   labs(
       title = "Cloud Coverage(%) vs Traffic Volume",
         x = "Cloud Coverage(%)",
         y = "Traffic Volume"
       ) + 
   theme_minimal()

Positive correlation, for every percent of cloud coverage there will be about 3 more cars on the road. Model is weak, with an R-squared value of 0.004.

5.1.2 Multiple Linear Regression

After seeing very weak R-squared values for each individual independent variable, I decided to perform multiple regression.

weather_model <- lm(traffic_volume ~ temp_c + rain_1h + snow_1h + clouds_all, data = metro)
best_weather_model <- step(weather_model, direction = "both")
summary(best_weather_model)

Call:
lm(formula = traffic_volume ~ temp_c + rain_1h + clouds_all, 
    data = metro)

Residuals:
    Min      1Q  Median      3Q     Max 
-3793.1 -1931.3   113.9  1641.0  4827.4 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2868.0467    16.0337  178.88   <2e-16 ***
temp_c        22.8038     0.7109   32.08   <2e-16 ***
rain_1h      -84.2778     8.9756   -9.39   <2e-16 ***
clouds_all     4.4172     0.2314   19.09   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1961 on 48189 degrees of freedom
Multiple R-squared:  0.02604,   Adjusted R-squared:  0.02598 
F-statistic: 429.5 on 3 and 48189 DF,  p-value: < 2.2e-16

5.1.3 Modeling Results

The best model used the temperature, rainfall, and cloud coverage to predict traffic volume. The snowfall variable was deemed to worsen the model.

The model is Traffic Volume = 2868.05 + (22.80 * temp_c) + (-84.28 * rain_1h) + (4.42 * clouds_all).

For every increase in temperature (Celsius) 23 more cars appear on the road. For every increase in rain (mm) 84 fewer cars appear on the road. For every increase in cloud coverage (percentage) 4 more cars appear on the road.

This model is weak, with an adjusted R squared value of 0.026.

These three independent variables, along with the intercept, are statistically significant to the model, given all of their P-Values were well below the threshold of 0.05.

5.2 Evaluation

The final model chosen is the Traffic Volume model from the Results section of Multiple Linear Regression.

This model is weak, with the best multiple regression model saying that the given weather variables only explain about 2.6% of the variation in traffic volume. This tells me that traffic on the main highway connecting Minneapolis and St. Paul remains steady regardless of weather conditions, because many people still need to commute to work.

5.3 Deployment

Therefore for the app, I would inform users that weather does not significantly predict traffic volume. I would also inform them that even though this is the case, it does not mean driving conditions will not be poor. This will be explored more in the next method.

TwinTraffic MLR
Figure 2. TwinTraffic Weather and Traffic Information [3]

Note: My suggestion for TwinTraffic is to update how they collect their snowfall data. Snowfall accumulation on a very busy road will more often than not produce low values, as the road is almost always heated. Therefore, my recommendation would be for them to record total snowfall for the hour with no melting. I believe traffic is more impacted based on what people see on their weather forecasts and in the sky rather than what is actually on the road.


6 Method 2: K-Means Clustering

Question:

What weather-related events can be tied together to inform users about dangerous driving conditions throughout the day?

6.1 Modeling

I will perform k-means clustering using hour, along with combinations of traffic volume, rain, and temperature.

Note: I understand k-means performs distance calculations based on numbers, but hour is technically a time variable. Most hours are fine, as there are no differences to their distance. However, when coming to 23 and 0, k-means will treat them as having a great distance, when in reality they are only one hour apart. For my analysis, this will not be a problem because my goal is to find groupings based on general times of day. I believe the model will still find useful groupings.

Also, the snow_1h variable was not considered in this method based on the points made in Data Understanding.

# Grab the data used for K-Means
metro_kmean_1 <- metro %>%
  select(hour, traffic_volume, rain_1h)

# Scale the data
metro_scaled_1 <- metro_kmean_1 %>%
  mutate(across(everything(), scale))

# Determine best K
set.seed(123)
wss_1 <- map_dbl(1:10, function(k) {
  kmeans(metro_scaled_1, centers = k, nstart = 25)$tot.withinss
})

tibble(k = 1:10, wss_1 = wss_1) %>%
  ggplot(aes(x = k, y = wss_1)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:10) +  # <- forces integer ticks
  labs(title = "Elbow Method for Optimal K",
       x = "Number of Clusters", y = "WCSS") +
  theme_minimal()

I selected a k-value of 4 based on the elbow plot. It goes from vertical to horizontal mostly from that point.

# Create the model
set.seed(123)
kmeans_result_1 <- kmeans(metro_scaled_1, centers = 4, nstart = 25)

# Assign the clusters
metro_clustered_1 <- metro_kmean_1 %>%
  mutate(cluster = factor(kmeans_result_1$cluster))

# Create 3D scatterplot using plotly
plot_ly(
  data = metro_clustered_1,
  x = ~hour,
  y = ~traffic_volume,
  z = ~rain_1h,
  color = ~cluster,
  colors = c("#00204c", "#ff6b6b", "#006d4d", "#91e2ee"),
  type = "scatter3d",
  mode = "markers",
  marker = list(size = 4)
) %>%
  layout(
    title = list(text = "Clustering for Hour, Traffic, and Rain"),
    margin = list(t = 80),
    scene = list(
      xaxis = list(title = "Hour"),
      yaxis = list(title = "Traffic Volume"),
      zaxis = list(title = "Rain(mm)")
    )
  )

Red - Afternoon to night, low to moderate traffic, low rain. (Safe after-work drive)

Green - Early morning, low traffic, low rain. (Very safe, early-risers drive)

Dark Blue - Varying time, varying traffic, high rain. (Wet roads all day)

Light Blue - Morning to evening, moderate to high traffic, low rain. (Normal, busy commute)

# Grab the data used for K-Means
metro_kmean_2 <- metro %>%
  select(hour, temp_c, traffic_volume)

# Scale the data
metro_scaled_2 <- metro_kmean_2 %>%
  mutate(across(everything(), scale))

# Determine best K
set.seed(123)
wss_2 <- map_dbl(1:10, function(k) {
  kmeans(metro_scaled_2, centers = k, nstart = 25)$tot.withinss
})

tibble(k = 1:10, wss_2 = wss_2) %>%
  ggplot(aes(x = k, y = wss_2)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:10) +  # <- forces integer ticks
  labs(title = "Elbow Method for Optimal K",
       x = "Number of Clusters", y = "WCSS") +
  theme_minimal()

I selected a k-value of 3 based on the elbow plot. It goes from vertical to horizontal mostly from that point.

# Create the model
set.seed(123)
kmeans_result_2 <- kmeans(metro_scaled_2, centers = 3, nstart = 25)

# Assign the clusters
metro_clustered_2 <- metro_kmean_2 %>%
  mutate(cluster = factor(kmeans_result_2$cluster))

# Create 3D scatterplot using plotly
plot_ly(
  data = metro_clustered_2,
  x = ~hour,
  y = ~temp_c,
  z = ~traffic_volume,
  color = ~cluster,
  colors = c("#00204c", "#ff6b6b", "#006d4d"),
  type = "scatter3d",
  mode = "markers",
  marker = list(size = 4)
) %>%
  layout(
    title = list(text = "Clustering for Hour, Temperature(C), and Traffic Volume"),
    margin = list(t = 80),
    scene = list(
      xaxis = list(title = "Hour"),
      yaxis = list(title = "Temperature(C)"),
      zaxis = list(title = "Traffic Volume")
    )
  )

Red - Morning to night, cold, low to high traffic. (Normal winter driving)

Green - Early morning, varying temperature, low traffic. (Very safe, early-risers drive)

Dark Blue - Morning to night, warm, low to high traffic. (Normal summer driving)

# Grab the data used for K-Means
metro_kmean_3 <- metro %>%
  select(hour, temp_c, rain_1h)

# Scale the data
metro_scaled_3 <- metro_kmean_3 %>%
  mutate(across(everything(), scale))

# Determine best K
set.seed(123)
wss_3 <- map_dbl(1:10, function(k) {
  kmeans(metro_scaled_3, centers = k, nstart = 25)$tot.withinss
})

tibble(k = 1:10, wss_3 = wss_3) %>%
  ggplot(aes(x = k, y = wss_3)) +
  geom_line() +
  geom_point() +
  scale_x_continuous(breaks = 1:10) +  # <- forces integer ticks
  labs(title = "Elbow Method for Optimal K",
       x = "Number of Clusters", y = "WCSS") +
  theme_minimal()

I selected a k-value of 4 based on the elbow plot. It goes from vertical to horizontal mostly from that point.

# Create the model
set.seed(123)
kmeans_result_3 <- kmeans(metro_scaled_3, centers = 4, nstart = 25)

# Assign the clusters
metro_clustered_3 <- metro_kmean_3 %>%
  mutate(cluster = factor(kmeans_result_3$cluster))

# Visualization
# Create 3D scatterplot using plotly
plot_ly(
  data = metro_clustered_3,
  x = ~hour,
  y = ~temp_c,
  z = ~rain_1h,
  color = ~cluster,
  colors = c("#00204c", "#ff6b6b", "#006d4d", "#91e2ee"),
  type = "scatter3d",
  mode = "markers",
  marker = list(size = 4)
) %>%
  layout(
    title = list(text = "Clustering for Hour, Temperature(C), and Rain"),
    margin = list(t = 80),
    scene = list(
      xaxis = list(title = "Hour"),
      yaxis = list(title = "Temperature(C)"),
      zaxis = list(title = "Rain(mm)")
    )
  )

Red - Morning, all temperatures, low rain. (Safe, morning commute)

Green - All day, warm, high rain. (Wet, slippery summer days)

Dark Blue - Afternoon to night, warm, low rain. (Normal after-work, summer drive)

Light Blue - All day, cold, low rain. (Cold, winter driving)

6.2 Evaluation

The final model I have decided to go with is the Hour, Temperature, and Rain model because the goal of this method was to identify potentially dangerous driving conditions. This model gives a clear understanding of physical road conditions. I absolutely would have used the snowfall data, as it is a huge cause for dangerous driving conditions, however I deemed it to be unreliable as mentioned before.

6.3 Deployment

Using this Hour, Temperature, and Rain model I can make some suggestions. The following groupings were made:

Red Cluster - Morning drive will be safe, no rain expected.

Green Cluster - Summer rain is expected today, roads may be slippery, drive with caution.

Dark Blue Cluster - Warm weather and clear skies for afternoon/evening drive, safe driving conditions.

Light Blue Cluster - Cold temperatures all day, even without rain, the roads could be slick in the morning or night, drive with caution.

Red Cluster
Figure 3. TwinTraffic Notification: Red Cluster [3], [4]

Green Cluster
Figure 4. TwinTraffic Notification: Green Cluster [3], [4]

Dark Blue Cluster
Figure 5. TwinTraffic Notification: Dark Blue Cluster [3], [4]

Light Blue Cluster
Figure 6. TwinTraffic Notification: Light Blue Cluster [3], [4]


7 Method 3: Decision Tree

Question:

Can we predict high traffic volumes to provide notifications to drivers before they start their commute?

7.1 Modeling

I will create a classification decision tree using hour and is_weekend. The target variable is the binned traffic_volume. The categories are Low, Moderate, and High.

7.1.1 Predictions

# Choose what to predict
table(metro$traffic_volume_bin)

    High      Low Moderate 
   12060    12043    24090 
# Create data partitions
set.seed(123) 
train_index <- sample(1:nrow(metro), 0.7 * nrow(metro)) # randomly sample 70% of the rows, starting at row 1
train_data <- metro[train_index, ] # only grabs the rows with those indexes, and all columns
test_data <- metro[-train_index, ] # grabs everything except those indexed rows

# Fit a decision tree (Classification)
tree_model <- rpart(traffic_volume_bin ~ hour + is_weekend, data = train_data, method = "class")

predictions <- predict(tree_model, test_data, type = "class") # run predict function(model, data, type), gives the True positive, true negative type results

# Makes a table just to see how much the train prediction got right using the test data
confusionMatrix <- table(Predicted = predictions, Actual = test_data$traffic_volume_bin)

# Just runs the table
confusionMatrix
          Actual
Predicted  High  Low Moderate
  High     2873   23      526
  Low         0 3179      172
  Moderate  723  375     6587
# Assigns the accuracy of the model to a name
accuracy <- sum(predictions == test_data$traffic_volume_bin) / nrow(test_data) * 100

# Print the accuracy
print(paste("Accuracy:", round(accuracy,3), "%"))
[1] "Accuracy: 87.419 %"

7.1.2 Plot

# Plots the model, rpart.plot(model, main = "Title you want")
rpart.plot(tree_model, type =5, main = "Decision Tree for Traffic Volume")

7.2 Evaluation

This was an extremely accurate model. After training the model on 70% of the data and testing it on the remaining 30%, the decision tree achieved an accuracy of 87.419%. This model will therefore be used in the TwinTraffic app.

There are two leaves that result in High traffic volume.

  1. Weekdays, from 6 AM to before 10 AM (third leaf from the left).
  2. Weekdays, from 2 PM to before 6 PM (second leaf from the left).

7.3 Deployment

These match up perfectly with the morning commute and after-work rush hour. The app will trigger alerts for high-traffic during these peak times.

TwinTrafficDT
Figure 7. TwinTraffic Alert [3], [4]

8 Conclusion

This project used weather-related variables and time-related variables and went into detail about how they influence traffic volume and driving conditions in the Twin Cities area from 2012 to 2018.

From the data we found out that:

  • Weather did not explain much of the variation in traffic volume.

  • There were clusters of driving conditions, useful for informing app users.

  • High volume traffic could be alerted to users based on time and if it was the weekend or not.

Using data, TwinTraffic can make data-driven decisions and bring better information to their users.

9 References

Ref. Num. Name Link Notes
[1] TwinTraffic logo https://chatgpt.com/ AI-generated by ChatGPT (OpenAI), created on 2025-08-18.
[2] Metro_Interstate_Traffic_Volume.csv https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
[3] Mobile phone mockup https://www.canva.com/templates/EAFHKP1CWnU-cream-minimalist-notification-reminder-message-instagram-story/ https://www.canva.com/policies/content-license-agreement/
[4] Wallpaper of Minneapolis https://commons.wikimedia.org/wiki/File:Minneapolis_%2849674291772%29.jpg This file is licensed under the Creative Commons Attribution-Share Alike 2.0 Generic license.

Note:

ChatGPT helped a lot with the final formatting of this report, guiding me with how to create tabs and organizing pictures.